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ABSTRACT 

We show that redshift-space distortions of galaxy correlations have a strong effect on correlation functions 
with distinct, localized features, like the signature of the baryon acoustic oscillations (BAO). Near the line 
of sight, the features become sharper as a result of redshift-space distortions. We demonstrate this effect by 
measuring the correlation function in Gaussian simulations and the Millennium Simulation. We also analyze 
the SDSS DR7 main-galaxy sample (MGS), splitting the sample into slices 2.5° on the sky in various rotations. 
Measuring 2D correlation functions in each slice, we do see a sharp bump along the line of sight. Using 
Mexican-hat wavelets, we localize it to (1 10 ± 10) h Mpc. Averaging only along the line of sight, we estimate 
its significance at a particular wavelet scale and location at 2.2cr. In a flat angular weighting in the (n, r p ) 
coordinate system, the noise level is suppressed, pushing the bump's significance to Act. We estimate that 
there is about a 0.2% chance of getting such a signal anywhere in the vicinity of the BAO scale from a power 
spectrum lacking a BAO feature. However, these estimates of the significances make some use of idealized 
Gaussian simulations, and thus are likely a bit optimistic. 

Subject headings: cosmology: large-scale structure of Universe - methods: data analysis 



1. INTRODUCTION 

In a wide range of cosmological models, acoustic oscilla- 
tions were generated due to competition between the pressure 
exerted by the photons and the gravitational collapse of per- 
turbations in the density of baryons in the relativistic baryon- 
photon plasma of the early universe prior to the epoch of 
recombination (|Silk||1968l |Peebles & Yu||1970| |Sunyaev & 
ZeldovichH 19701 IBond & Efstafhiou||1984[ |1987| IHoltzman 



1989 ). These are imprinted not only on the temperature power 
spectrum of the cosmic microwave background (CMB), as de- 
tected convincingly for the first time around the turn of the 
millennium (de Bernardis et al. 2000 Hanany et al. 2000), 
but also on the power spectrum of matter and galaxies. The 
Baryon Acoustic Oscillation (BAO) feature was first convinc- 
ingly detected in the correlation function of Luminous Red 
Galaxies (LRGs) in the Sloan Digital Sky Survey (SDSS) by 
|Eisenstein et al.| ( |2005| l. Further evidence for the BAO has 
been found subsequently, in the SDSS and other galaxy sur- 
veys (|Cole et al.||2005| [Padmanabhan et al. 2007} |Percival 

|et al.|2007||20T0l >- 

Some of these papers used correlation functions, others 
used power spectra. Correlation functions have both advan- 
tages and disadvantages over power spectra. For example, 
when there are quasi-harmonic features, like the BAO, form- 
ing a sequence of peaks in A:-space, they can add to a much 
stronger feature in the correlation function since the harmon- 
ics all add to the same peak in A disadvantage of the 
correlation function is that unlike the power spectrum, it suf- 
fers from substantial off-diagonal covariances, even on large 
scales (e.g. Hamilton 2009). However, it has recently been 
shown that even the power spectrum is not entirely immune 
from covariance effects in the mildly nonlinear regime (Rimes 
& Hamilton|2"005||Neyrinck et al.|2006| l. 
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Correlation functions of lower-dimensional subsets for a 
homogeneous isotropic random field are identical to the one 
estimated from the full three-dimensional one. This can 
also be extended to density fluctuations with line-of-sight 
anisotropies caused by redshift space distortions. Over the 
last two decades there were many redshift surveys, which had 
penci l -beam- or slice-like geometries (e.g. de Lappare nt et al. 

Broadhurst et al. 
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Landy et al.||1996 


2003 


Weiner et al.|2005 
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ing optimal use of the multi-fiber capabilities of the telescope 
over a limited field of view. In any case, all these geometries 
should lead to the same redshift space correlation function. 
However, there are subtle effects due to the geometry entering 
the covariance of the correlation function. 

The statistical analysis of significance for correlation func- 
tions is a notoriously difficult task (Kai ser & Peacock|1991[ ), 
since the different bins of the correlation function are always 
correlated to one another. The different modes in the power 
spectrum are more independent, as long as the survey win- 
dow is large enough, making estimation in many ways easier 
( |Tegmark et al.|1 998). For anisotropic windows the shape of 
the independent "grains" in fc-space can be very elongated, 
leading to complications, like mixing modes over a wide 
range of scales. This is particularly true for pencil-beams or 
slices, where power spectra have rarely been used success- 
fully to characterize structure. 

Because the behavior of the BAO at low redshift only 
slightly departs from the predictions of linear cosmological 
perturbation theory, the BAO signature has recently attracted 
attention as a powerful "standard cosmological ruler." It is 
a useful probe to explore the nature of dark energy or large- 
distance modifications of gravity, i.e. explanations of the ob- 
served accelerated expansion of the Universe. 

There is a recent controversy about the reality of a nar- 
row peak at the BAO scales along the line of sight (LOS) in 
the redshift-space powe r spectrum of the SDSS LRG sample 
( Gaztanaga et al.||2009| l. Gravitational lensing is mentioned 
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as a possible o rigin of this signal, bu t this explanat i on ha s 
been disputed (|Miralda- Escude||2009| ). |Kazin et aL] ( |2010| i, 
using a suite of realistic LRG mock catalogs, conclude that 
the LOS peak is likely a statistical fluke. Recently, ( |Cabre & 



|Gaztanaga|20 1 0| l reached a similar conclusion about the sig- 
nificance of BAO detections, but argued that even a modestly 
significant peak can be used to constrain cosmological param- 
eters, under the reasonable assumption that the galaxy power 
spectrum in the Universe contains a BAO feature. 

In these papers, the statistical significance of a BAO fea- 
ture is assessed by comparing data to models without BAO; 
for example, if a correlation function is better-fit by such a 
"no-wiggle" model, the features are judged to be spurious. 
Here, we test a different technique, detecting peaks with a 
Mexican-hat wavelet. Although it seems more forgiving of 
modest BAO-like peaks than the standard approach, we show 
that it still has much BAO-location constraining power 

In this paper, we revisit the linear theory of redshift space 
distortions, and show that for power spectra with distinct 
spectral features, redshift space distortions will have a char- 
acteristic sharpening effect on these, especially along the line 
of sight, even in linear theory. In Section 2, we demonstrate 
how different geometric sampling strategies of a given sur- 
vey volume can lead to strong, but quite different covariances 
in the resulting correlation functions. In Section 3, we de- 
scribe our wavelet-based technique for BAO peak detection, 
and measure its statistical and systematic uncertainties using 
simple Gaussian simulations. In Section 4, we analyze a sam- 
ple of galaxies from the Millennium simulation (MS Springel 



et al. 2005) using the wavelet technique. In Section 5, we 



build a sample of SDSS DR7 ( Abazaji aiTeTaT1|2009| main- 
galaxy sample galaxies, and analyze the LOS and 2D correla- 
tion functions using a methodology similar to the simulations: 
we subdivide the sample into many thin slices, compute the 
2D redshift space correlation function and calculate the aver- 
age. In Section 6, we apply the wavelet peak-finding formal- 
ism to the SDSS results. Finally, in the conclusion, we discuss 
our findings. 

2. REDSHIFT SPACE CORRELATIONS IN LINEAR THEORY 
2.1. Amplifying the BAO Features 



The seminal paper of Kaiser ( 1987[ ) laid out the framework 
to compute the redshift space distortions of power spectra, 
in the plane-parallel limit, w hen the tw o galaxies ar e very 
close to the same LOS. Later, Hamilton] ( |1992| ), and |HamlF] 
ton & Culhane (1996]) worked out explicit expressions for 
the correlation function. Heavens & Taylor] ( |1995| ) consid- 
ered the problem for all-sky redshift surveys using a spherical- 
harmonic analysis. 
Szalay et aL](|1998| hereafter SML98) used bipolar spheri- 



cal harmonics to extend the previous calculations of the cor 
relation function to arbitrary angles between the two lines 
of sight, and have computed explicit expressions for the dis- 
torted correlation function in different direct ions. Their work 
was further extended by Szapudi (2004 ) and Papai & Szapudi 



(2008), using a slightly different coordinate system, and in 
eluding contributions from a previously neglected term in the 
Jacobi an of the real- to redshift-space mapping. Scoccimarro 
(2004) included various non-linear effects, in particular con- 
tributions from the nonlinear pairwise velocity distributions. 

The basic papers laying out the theory of redshift-space dis- 
tortions were written in the 80's and 90's, when the generally 
accepted assumption was that the cosmological dark matter 
power spectrum is smooth. Here we would like to show that 
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Fig. 1 . — The two-dimensional correlation function for a power spectrum 
with a BAO feature. Note how the bump gets sharper, but also lower, as it 
approaches the LOS. The function plotted is sinh~'(300f(;r, r„)), linear near 
zero, but logarithmic for high values of Units on both axes are h Mpc. 

one of the effects of the redshift space distortions on the cor- 
relation function is a sharpening of "bumpy" features, in di- 
rections close to the LOS. 

Following SML98, we can write the redshift space corre- 
lation function as a function of the pairwise distance r, and 
the angle 9 (y in SML98), using the expression for the plane- 
parallel limit (especially applicable near the LOS), as 

{(r,6) = (l+2B/3+B 2 /5)z (r) 

-(4B/3+4B 2 n)^ 2 (r)P 2 (cos6) 

+ (8//35)6W/> 4 (cas0), (1) 

where P„(x) is the Legendre polynomial of order n, B is the 
usual redshift space distortion parameter, B - Cl®f/b, where b 
is the bias factor, and fz,(r) is the L-th spherical Bessel trans- 
form of the isotropic, real-space power spectrum P(k), 



2tt 2 J 



dkk 2 j L {kr)P(k). 



(2) 



The three-dimensional angular average of Eq. ([!]) is £30 = 
(1 + 2/3/3 + B 2 /5)^o(f), using a sin 6* dO weighting; only the 
isotropic ^0 contributes due to the orthogonality of the Legen- 
dre polynomials. However, such a weighting suppresses con- 
tributions from close to the LOS, where sin 9 — 0. If r p ) is 
averaged with a flat (without the sin 9 factor) angular weight- 
ing, appropriate for a 2D sample, there are also contributions 
from £2 and £4: 

&o(r) = (l + 2B/3 +B 2 I5) &(r) - (b/3 +B 2 /l) &(r) 

+ (9B 2 /280){ 4 (r). (3) 

Figure [T] shows contours of the linear redshift-space corre- 
lation function %(n, r p ) derived from a power spectrum with 
a BAO bump at 107 ft -1 Mpc. In linear theory, the BAO fea- 
ture sharpens near the LOS. Here n and r p are galaxy separa- 
tions projected parallel and perpendicular to the LOS. We use 
a sinh _1 (300£) transform for easy visualization for both large 
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Fig. 2. — The two-dimensional linear theory correlation function is shown 
along the LOS (x), along the transverse direction (r p ), together with two an- 
gular averages of the correlation function. The 3D curve uses a sinfl weight- 
ing, where is the angle away from the LOS in the (n,r p ) coordinate sys- 
tem. The 2D curve, which has a sharper bump, uses a flat (without the sin 8) 
weighting. The lower panel shows the various f s, after multiplying by r 2 . 

and small The sinh _1 (x) function equals x for x <sc 1, and 
In x for x » 1 . 
The LOS correlation function, £(tt), can be written as 

m = (1+2/3/3 +/S 2 /5)ftW- 

(4/3/3 + 4/S z p) + (8^/35) (4) 

The term independent of /3 is just the isotropic, real-space 
correlation function. The term linear in f3 can be written as 
(2/?/3)[£oO'r)-2£2(X)], which inside the ^-integrals behaves as 
jo(k„)—2 j2(k„) = -3 fn(k„), the second derivative of the spher- 
ical Bessel function. Such operators were explicitly shown 



in Eq. (4) of Hamilton ( 1992 1. Applying a second derivative 



to a Gaussian, and adding it to the function with a positive 
weight will sharpen the peak, as seen in Fig. [2] The sharpen- 
ing is much weaker far from the LOS, in the transverse (r p ) 
direction. The other effect of redshift-space compression is an 
overall smooth shift towards negative values along the LOS, 
and towards positive values in the transverse. 

In Fig.[2j we also show the results if £ is multiplied by r 2 , in 
which case the peaks at different angles line up in scale more 
obviously. This could be because the slopes on which the 
peaks find themselves at different angles are much more sim- 
ilar in r 2 £ than in £ itself. This is an example illustrating how 
using the absolute position of £'s local maximum, without re- 
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Fig. 3. — sinh (300f) from an Eisenstein & Hu 1 1998 linear no-wiggle 
power spectrum, with and without linear redshitt-space distortions. The con- 
tours are at -1, 0, and 1. 

gard to the slope it is on, can produce a small bias. A peak 
locator that estimates the second derivative, as our wavelet 
transform does, is insensitive to such slopes. 

Fig. [3] shows an Eisenstein & Hu| ( 1998 1 no-wiggle linear 
£(7T, r p ) in real and redshift space. Note that the trough along 
the LOS at n as 55 h~ l Mpc appears even in the no-wiggle 
case, and thus seems unrelated to the BAO feature in linear 
theory. However, as shown below in Fig. 21 the trough is 
highly amplified in samples with full nonlinearities (in the 
SDSS and MS samples); we suspect that this could be a re- 
sult of non-linear infall toward the BAO ridge. 

For every pair of galaxies, the observer and the two galax- 
ies define a plane, and the whole problem is invariant under 
any rotation of this plane around the observer, located at the 
origin. Thus the anisotropic redshift space correlation func- 
tion is inherently planar, and in 3D it has an axial symmetry 
for rotations around the LOS. This means that one can also 
estimate the same 2D correlation function from an arbitrar- 
ily thin slice of the data (though if one goes too thin, shot 
noise will swamp the signal). From the projection-slicing the- 
orem, described below, this is equivalent to first projecting 
the three-dimensional redshift-space power spectrum down to 
two dimensions, and then perform a two-dimensional inverse 
Fourier transform. This has been noted in SML98. Com- 
puting the anisotropic redshift space correlation function this 
way does not violate any of the underlying symmetries of the 
problem. However, it impacts the covariances of the estimated 
correlation function, as we will show. 

2.2. Projection-Slicing Theorem 

There is well-known theorem in signal processing and med- 
ical imaging, stating a relation between the Fourier transform 
of a lower dimensional subset of a multidimensional scalar 
field and the original Fourier transform. 

The projection-slicing theorem states that the Fourier trans- 
form of the projection of an A^-dimensional scalar function / 
onto an m-dimensional subspace is equal to an m-dimensional 
slice, going through the origin of the iV-dimensional Fourier 
transform of /. For example, if a 3D 5 is projected to 2D 
along the z direction, the Fourier transform of the 2D projec- 
tion will be on the k- plane of the 3D Fourier transform of 5. 
Symbolically, 



(5) 



where denotes the ^-dimensional Fourier transform, P m 
denotes a projection onto the m-dimensional subspace, and 
S m denotes slicing of the m-dimensional subspace. This tech- 
nique will be useful for the calculations below. 

We can compute the redshift space correlation function in 
several different ways. We can take the full 3D redshift space 
correlation function, and perform the axial averaging using 
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the LOS as the symmetry axis to get the 2D %(n, r p ). Alter- 
natively, we can compute the 2D correlation function directly 
from each slice in an ensemble, and then perform the averag- 
ing in 2D. 

In the case of LOS correlations, one can do the either of the 
above procedures, then take the LOS, or one can even directly 
extract all possible pencil-beams from the volume and just 
compute the ID correlation function. In the extreme, near the 
LOS, all three cases will contain the same galaxy pairs, never- 
theless the covariances between the correlation function bins 
estimated in different ways will be somewhat different due 
to lateral correlations between neighboring slices and pencil- 
beams. In the following subsections we will illustrate how to 
compute the covariance of the correlation functions, at least 
in the linear limit, where the three results are quite intuitive, 
and show the subtle differences between the outcomes. Fur- 
thermore, even though the calculation below is only for the 
LOS correlation function, it can be easily extended for the 
covariance of the full 2D redshift space correlation function. 

2.3. Estimating the covariance of%LOs( r ) from pencil-beams 

We ignore the effects of shot noise, and only estimate the 
variance and covariance of the correlation function from the 
power spectrum. Consider first a single pencil-beam, aligned 
with the z-axis, drawn randomly from a cubic volume with pe- 
riodic boundary conditions, like the case of the N-body simu- 
lations we analyzed in this paper. The overdensity is measured 
in N discrete cells, along each axis of the cube. The estimator 
for the LOS correlation function is 

ii(r) = ^Y J S(r i )S(r i + r), (6) 

i 

where <5(r,) is the dimensionless overdensity in the z'th cell 
along the LOS. The expectation value (£\(r)^ = £i(r). 
We can compute the expectation value 

(f ! (r)#i (/)) = ^ J] (6(n)6(n + r)6(rj)6(rj + /)) . (7) 

U 

and the covariance 

Ci(r, r>) = <fr(r)?i(fO> " <?i(r)> <£i(0> • (8) 

If we stay in the Gaussian (linear) limit, there are no higher 
order, irreducible contributions, thus the expectation value of 
the product of four overdensities can be factored as 

<*i$2«S 3 *4> = (Si8 2 ) <ft&t> + <<5ic5 3 > <c5 2 <5 4 > + <<5ic5 4 > (6 2 6 3 ) . 

(9) 

With r' = ri + s, we can write the expectation value as 

(lwlc-'))=^X 

i,s 

[ (6(n)6(r, + r)> (6(n + s)6(n + s + r')) 
+ (5(n)S(n + s)> (6( n + r)6(n + s + r')) 
+ (6(n)6(n + s + r')) (S(n + r)6(n + S )> ] (10) 
After the summation over the index i, we obtain 
(f(r)f(r')) = fttoftdO + 

^ 2 + r' - r) + + r')^(s - r)\{U) 

The first term is just the trivial product of the expectation val- 
ues. The next terms are the correlations of the correlation 



function, which we will denote as Z\ , a symmetric function of 
its argument, as 

Zl(z)= ^Yj^ {s) ^ {s+z) - (12) 

s 

We can compute Z\ in the limit of infinitesimal cells, using 
the Fourier transform of the one dimensional £i(z), denoted 
as ni(k z ). Due to the projection-slicing theorem, the one- 
dimensional power spectrum corresponding to the correlation 
function along the pencil-beam is the projection of the three- 
dimensional power spectrum P, anisotropic due to the redshift 
space distortions along the z-axis: 

n \(k z ) = -7J-T0 ( dk x dk y P{k x ,ky,k z ), (13) 

and 

Zi(z) = ^ J dk z e ik *\n,{k z )\ 2 . (14) 

In summary, we can write the covariance of the ID correlation 
function as estimated from a single pencil-beam as 

C 1 (r,r')=Z 1 (r-r')+Z 1 (r + r'). (15) 

2.4. The covariance of £los(z) for independent slices 

Next, we randomly select a thin density slice from a cubic 
volume, and estimate the two dimensional correlation func- 
tion. We essentially repeat the derivation above, except we 
have two indices: ;' is along the LOS, and j is along the slice: 

£ 2 (s, r) = ^ Yj 6< - r j' ri)6{ * r i + s ' n + r) - ( 16) 

The LOS correlation function is the special case with 5 = 0. 
Following the previous calculation, we obtain the equivalent 
2D covariance as 

C 2 (r, r') = Z 2 (0, r-r')+ Z 2 (0, r + r'). (17) 

with Z 2 now related to n 2 {k y , k z ), the 2D projection of the 3D 
power spectrum: 

n 2 {ky,k z ) = dk z P(k x ,ky,k z ), (18) 

Z 2 {s,r)=-±- i fdk y dk z e i{k ' s+k ^\7T 2 (k y ,k z )\ 2 . (19) 
Along the LOS, Z 2 (0, r) can be written as: 

Zi(0 ' r) = hj dKeiKr f ir' 7 * (20) 

This is the projection-slicing theorem at work again: we are 
'slicing' the 2D super-correlation function at s = 0, therefore, 
we need to project its power spectrum down to the z-axis. This 
is however different from the previous result, there the projec- 
tion has happened for the power spectrum, before the square 
was taken. Here we performed one projection before and one 
after taking the square of the power spectrum. 

2.5. The covariance from the average of slices 

We repeat the calculation of the LOS correlation function as 
above, but now we average over the whole three-dimensional 
cube. We can perform this by subdividing the volume into 
a set of adjacent slices, computing the correlation function 
of each slice, then averaging them over the whole set. The 
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derivation is quite similar, except now we have a 3D estimator 
of the LOS correlation function. We will just present the final 
result: 

f 3 (0, 0, r) = ^ 2 8{r k , r h n)6(r k , r h n + r) (21) 
C 3 (r, r') = Z 3 (0, 0, r - r') + Z 3 (0, 0, r + r'), (22) 

with 

If ., r dk x dk y , 
Z 3 (0, 0, r) = — J J -^r/l^fe- *>■, fc,)| 2 (23) 

Here all projection takes place after squaring the anisotropic 
3D power spectrum. Even without evaluating any of the in- 
tegrals one can see that the covariances will be the largest in 
this, third case, since the quadratic mean is always larger than 
the arithmetic one. So if at a given k, we project the square 
of the 3D power spectrum, we always get a larger number if 
some of the projection takes place before taking the square. 

2.6. Effect of strong covariances on the estimated errors 

We can write a linear transform of correlation function val- 
ues in bins as a — Yti UiXi, where m, are the coefficients of the 
linear transformation, and x, are the binned values of the ID 
correlation function. Assuming that the transform has a finite 
support, the sum is over a small, finite number of bins. The 
expectation value of a can be written as 



(a) = ^ Ui(Xi) 



(24) 



For a wavelet transform, the vector u contains some positive 
and negative coefficients, so that their sum is equal to 0. The 
expectation value of a 1 is 



(a 2 ) 



2^UiUj{XiXj), 



and the variance is 

Var(a) = (a 2 ) - {a} 2 = ^ UfUjKxtXj) - {Xi}{xj)]. 



(25) 



(26) 
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Consider a simple wavelet of length 3: u = (-0.5, 1, -0.5). 
The covariance of the correlation function is dominated by 
the Z(r - r') term. Since this is a symmetric function of its 
argument, it is reasonable to approximate it with an inverse 
parabola Z(r) — (1 - ar 2 ), taking also unit diagonal variance. 
Expanding the covariance matrix up to second order in r 

1 1 - a 1 - 4a \ 



1 - a 
1 -4a 



1 

1 — a 



1 - a 
1 



(27) 



then the variance is given by u T Zu = 0. The covariances of 
the LOS correlation function are quite high. Thus, applying 
a similar, but much wider wavelet can potentially reduce the 
variance considerably compared to the nominal diagonal vari- 
ances, although it will never go to zero as in this extreme ex- 
ample. This Section only serves as an illustration of the effect 
of the strong covariances, and the importance of the particu- 
lars of the sampling strategy chosen. 

Let us describe what this means in simple terms. Consider 
first a set of independent, uncorrelated slices randomly drawn 
from a infinite sample, and denote the variance of a wavelet 
coefficient at some scale over this ensemble as <x|. This vari- 
ance will be already smaller than the variance of the individ- 
ual bin values of the correlation function, since the covariance 
among the bins [Cifoz')] is reducing the wavelet variance. 



Next, let us draw a set of adjacent slices from a coherent 
cubical volume, and compute the average wavelet coefficient 
over this sample. The covariance of the estimator £3 now 
becomes much stronger, C 3 (z, z')', thus, the variance of the 
wavelet coefficient will be much smaller. 

If we took the average over the same number of slices from 
the independent ensemble, the variance would be reduced by 
VA^ s ii C es, due to the central limit theorem. For £ 3 , though, es- 
timating the variance among the slices assuming they are in- 
dependent will be an underestimate of the true variance. But 
can be used as a definite bound on the expected variance 
reduction from the slice-averaging process. At the end of Sec- 
tion 5.4.1 we demonstrate this behavior using Gaussian simu- 



lations. 

3. QUANTIFYING THE REDSHIFT SPACE FEATURES 
3.1. Peak Location from a Mexican Hat Wavelet 

We measure the sharpness of peaks in the correlation func- 
tion using a Mexican hat wavelet, similar in spirit to the 
(differently shaped) wavelet technique proposed by Xu et al. 
(2010). A transform using a Mexican hat wavelet, a second 
derivative of a Gaussian, provides a measurement of (minus) 
the second derivative of a function, estimated over the scale 
radius of the wavelet. Before discussing the Mexican hat it- 
self, we describe estimators of the zeroth and first derivatives 
of a function, also using derivatives of a Gaussian. These es- 
timators will turn out to be useful in visualizing the sharpness 
of the BAO bump as a function of angle. 

Define a Gaussian of scale radius s, and normalized to equal 
1 at its center (bump position r b ), as 



G (r b , s; r) = exp [-{r - r b ) 2 /2s 2 ] . 



(28) 



(The arguments before the semicolon are really parameters of 
the function itself, which acts on parameters after the semi- 
colon.) An estimate of the mean of ^(n, r p ) = £(r, 9) (i.e. the 
zeroth derivative) within a radius s of r b is given by the av- 
erage {G (r b ,s)^) = j G (r b ,s;r)^(r,e)dr/ J Go(r b , s; r) dr. 
Here 8 is the angle away from the LOS. 

To estimate the radial first derivative within a radius s of r b , 
we use the following, where p = (r - r b )/s: 



Gi(r b ,s;r) = pG (r b ,s;r), 

J pG {r b ,s;r)^(r,6)dr 



(G 1 (r b ,s)& = 



J p 2 G (r b , s; r)dr 



(29) 
(30) 



We use () to denote a generalized average: in the denominator, 
the terms multiplying Go in the integrand are squared. 

An estimate of the second derivative is (G2(r, b)£), with the 
Mexican hat wavelet, 



(G 2 (r b ,s)0 = 



(31) 



(32) 



G 2 (r b , s; r) = (l - p 2 ) G (r b , s; r). 

The wavelet transform is given by 

/ pGi{r b , s; r)%(r,0)dr 

J p 2 G 2 (r b , s; r)dr 

Numerically, what we do is to evaluate and integrate the nu- 
merator on the same n, r p grid as £ is measured, multiplying 
the integrand by 1 jr to take out the r weighting from the 2D 
integration. Note that whenever we use the transform, we di- 
vide means by standard deviations, so in fact the normaliza- 
tion (in the denominator) drops out. 
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We note that in practice, we analyze correlation functions 

Gi(r b , s; r) = 0, 

C° Giirb, s; r) + 0. However, this problem is negligible if 
r b > 3s, and we confine our analysis to such cases. 

3.2. Flattening transformation with background subtraction 

The redshift-space £(r,8) has a strong quadrupole 
anisotropy, making it difficult to see the strength of the peak at 
different angles. To make a visual detection of the peak easier, 
we flatten £ at a given radius r b by estimating, and then sub- 
tracting, the zeroth and first derivatives with respect to both 
radius and angle. This is the background subtracted correla- 
tion function, shown in Fig. 17 The kernels G02 and G12 are 



sensitive to the angular quadrupole: 



Gwfj b , s; r, 8) = q 2 (6) G (r b , s; r); 
G n (r b , s; r, 8) = pq 1 (8) G (r b , s; r). 



(33) 
(34) 



Here q2(0) = cos 2 6 — 1/2, designed to be zero at 45°. 

The following flattening transformation enhances the visual 
contrast of features within a radial distance s of r b , essentially 
removing the constant and linear terms of a Taylor expansion 
of £ in r and g 2 . 



f fo, s; n, r p ) = tfr, 6) - [ <G £> + q 2 (0) <G 02 £> + 
p(G l &+pq 2 (0)(Gu&]. 



(35) 



Importantly, though, the Mexican hat is insensitive to linear 
and constant terms in r, so the flattening transformation leaves 
the Mexican hat transform unchanged for any (r b , s). Thus, 
the flattening transformation helps one to see bump-like fea- 
tures visually, but does not affect the quantitative assessment 
of them. When we plot the flattened correlation functions be- 
low, we multiply £(r b , s; n, r p ) by Go(r b , s) to emphasize the 
region to which the wavelet is most sensitive. 

3.3. Wavelet transforms of Gaussian fields 

Our method is to estimate the BAO scale by finding the 
peak in the signal-to-noise ratio S/N of the wavelet trans- 
form. At each (r b , s), this S/N is the mean divided by the 
standard deviation, over samples i, of the wavelet transform 
(G 2 (r b ,s)Ur,0)}. 

We first investigate the statistics and systematics of the 
BAO wavelet estimator using Gaussian simulations, with 
mode amplitudes drawn from a Rayleigh distribution about 
the mean linear power spectrum, and using random phases. 
The linear power spectrum is from camb ( Lewis et al.j 2000), 
using the same cosmological parameters as the Millennium 
Simulation. The Gaussian simulations have a box size of 
768 h Mpc, and a cell size of 2 ft" 1 Mpc. Linear redshift 
distortions are put on the density fields using Eq. (36i. We 



generated 1400 Gaussian simulations for each case below. 

Figs. [4] and [5] show the wavelet S/N from this ensemble of 
simulations, with and without slicing. Linear redshift distor- 
tions are induced along the LOS jc-axis, and two slicings are 
taken, along the y and z axes. The slice thickness used is 
16 ft" 1 Mpc, which is the physical slice thickness of the SDSS 
slices at a distance of about 370 ft" 1 Mpc. 

Each 4-plot figure shows the S/N in real and redshift space, 
and weighting £ with two different angular weightings in the 
(n, r p ) plane: along the LOS (within 6° of the LOS, jU 2 = 
cos 2 8 > 0.99); and with flat weighting in 8. In no case do 
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Fig. 4. — The signal-to-noise ratio of the wavelet transform 
(G2(rb, r p )^ applied to the full 3D correlation function of Gaussian re- 
alizations of volume (768 h~ l Mpc) 3 , with and without linear redshift distor- 
tions applied. The S/N assumes that a single box volume is being analyzed. 
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Fig. 5. — Same as Fig. [4] for averaged 2D slice correlation functions. The 
S/N assumes that a full box volume is being analyzed. 

we include the (sin 8) weighting typically used in angular av- 
erages of £ for 3D samples. Such a weighting would be inap- 
propriate for the 2D slices that we consider. Also, we do not 
wish to kill the signal along the LOS. 

In all cases, the S/N quoted is that expected from a single 
simulation box, i.e. it is the mean divided by the standard de- 
viation of the wavelet transform, measured over the 1400 sim- 
ulation boxes. Note that along the LOS, the S/N is enhanced 
for the averages of 2D slices. This is even while missing some 
diagonal modes in the box, since we use only 2 slicing orien- 
tations (along axes). Most of the modes very close to the LOS 
are picked up in at least one of the two orientations, but in 
the flat weighting, many diagonal modes are missing. This 
is likely why, with flat weighting, the S/N degrades with the 
slicings. 

Figs. [6] and [7] show the analogs of the previous two fig- 
ures, using no-wiggle power spectra. The features detected 
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Fig. 6. — Same as Fig. [4] except where the Gaussian simulations are gener- 
ated using a no-wiggle power spectrum. 
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Fig. 7. — Same as Fig. [4] for averaged 2D slice correlation functions, and 
where the Gaussian simulations are generated using a no-wiggle power spec- 
trum. 

in the no-wiggle case are generally small in amplitude. How- 
ever, there is a ridge with S/N « 0.24 reaching from (r& = 
75 h~ l Mpc, 5=15 h Mpc) to increasing (r&, s) along the 
LOS, for the redshift-distorted £los- This ridge is likely from 
where the dip at n w 55 hr x Mpc flattens out. This is a re- 
gion where the second derivative is substantially negative, but 
where the first derivative is positive (thus it is not, strictly 
speaking, apeak). There are also peaks at s = 2 hr x Mpc with 
fairly high significance. These are excluded from the peak- 
finding below, since they lie along an edge of the searched 
prior in s. 

A proper test against the null hypothesis of no BAO feature 
would use the variance from a no-wiggle power spectrum, but 
instead, below we estimate variances from the sample itself. 
Thus it is relevant to ask how the BAO feature affects the stan- 
dard deviation of the wavelet transform. Fig.[8]shows the ratio 
of the noise with and without the BAO feature, for both LOS 
and fiat weighting. Typically, the ratio is within 10% of unity, 
with only very slight fluctuations in r\,. 
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Fig. 8. — The ratio of the noise N, i.e. the standard deviation of the wavelet 
as a function of {rt, s), for Gaussian simulations with and without a BAO 
feature. 

3.4. Peak-finding Statistics and Systematics in Gaussian 
fields 

In each simulation, we find the maximum of its wavelet 
transform over a grid (r/, e [85, 135], s e [2, 26]), with a grid 
spacing of 2 hr x Mpc. We require that peaks not be along 
a rectangle-edge in s or r;,; a peak along an edge might in- 
dicate that a true peak would be outside the area searched. 
Thus, the actual possible range for peaks is a bit smaller, 
(Xb £ [87, 133], s e [4,24]). In Bayesian language, this is 
the range of our flat prior. The error bars that we quote in rb 
are marginalized over this flat prior in s. 

One might worry about covariance as a function of in the 
S/N plot, and whether it affects our estimate of either the sig- 
nificance, or location, of the bump. Certainly, the wavelet co- 
efficients at nearby (rb, s) are correlated (though less-so than 
in the raw £). The smoothness of the S/N plots is a sign of 
this. But this correlation does not affect the interpretation of 
the S/N: at (rb, s), it is a good measure of the confidence that 
a bump exists there, specifically that the Mexican hat wavelet 
coefficient is positive at that (rb, s). It does not matter that 
nearby in rh and s, the S/N is likely quite similar. Covariance 
in the wavelet coefficient also does not affect the peak location 
estimate, except that a particular wavelet choice could pro- 
duce a broader-than-optimal peak in the S/N plot. We have 
not attempted to optimize or orthogonalize our wavelet, but 
as we find below, the statistical and systematic errors on the 
true bump location in Gaussian simulations are still small. 

3.4.1. Statistical errors 



Figs.|9][T0j[TT]and 12 show 2D histograms of peak positions 
for individual redshift-distorted realizations in the ensembles 
whose mean S/N plots appear on the right-hand sides of Figs. 
|4j [5] [6] and [7] The case most relevant to our SDSS measure- 
ments is shown in Fig.flO] in which averaged 2D slice correla- 
tion functions, drawn from a simulations with BAO features, 
are analyzed. 

In the presence of a BAO feature, the (posterior) distribu- 
tions of peak fy's after peak-finding are tightened consider- 
ably compared to the prior. The set of r^'s in the flat prior 
have a standard deviation of 14 h~ x Mpc, which contracts to 
8 h~ l Mpc when looking along the LOS, and to 2 h Mpc (we 
conservatively round up) in the case of flat weighting. In con- 
trast, if there is no BAO feature, the posterior distribution of 
rb is hardly narrower than the prior. In the left panel of Fig. 
[TO] there are a couple of curious accumulations of very narrow 
peaks, along the s — 4 edge. However, they do not appear to 
be related to the peaks in the cases with BAO features. True 
features are somewhat extended in s; we suspect that these 
narrow features are simply noise. 

As recently stated by |Cabre & Gaztanaga ( 2010| l, the 
model-comparison question of whether a BAO feature exists 
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Fig. 9. — A 2D histogram of the peak in (>•(,, s) of (Giiry, s)^j(r,8)) mea- 
sured in 1400 Gaussian simulations (', divided by its simulation-to-simulation 
standard deviation. Here ^ is the full 3D correlation function, with linear 
redshift distortions applied. With LOS and flat weightings, the peaks have 
locations 111.2±9.0and 110.7±1.6 tC 1 Mpc. 
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Fig. 10. — Same as Fig. [9] for averaged 2D slice correlation functions. The 
S/N assumes that a full box volume is being analyzed. With LOS and flat 
weightings, the peaks have locations 111.1 ±7.9 and 110.2±1.8 h Mpc. 
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Fig. 11. — Same as Fig. [9] except where the Gaussian simulations are gen- 
erated using a no-wiggle power spectrum. With LOS and flat weightings, the 
peaks have locations 109.3+14.1 and 113.8±12.2 ft" 1 Mpc. 

in a given sample is quite different than the question of its 
constraining power. Under the reasonable assumption that the 
power spectrum underlying the structure in our Universe has 
a BAO feature, even a low-significance bump gives substan- 
tial constraining power. For example, if there happens to be 
a LOS wavelet S/N peak in a sample, we find that the LOS 
weighting gives an 8 h Mpc error bar, even though about 
half of the mocks possess no LOS peak at all (see Fig. 13 I. 

One might also wonder whether error ellipses on the peak 
rt, and s can be inferred directly from the S/N plots. Com- 
paring the S/N plots to the distributions of peaks in Figs. |9| 



10 11 and 12 it seems that there is only a loose relationship. 
For the two to correspond exactly, the probability of a given 
(rb, s) having a global peak would have to be simply related 
to its local S/N value. Given the additional complication of 
correlations in the S/N plot, it is perhaps not surprising that 
they are not simply related. One curious difference is that 
peaks seem more prevalent at small s than the mean S/N plots 
would suggest, especially for flat angular weighting. 

3.4.2. Frequency of maximum S/N 



Fig. 12. — Same as Fig. [5] for averaged 2D slice correlation functions, and 
where the Gaussian simulations are generated using a no-wiggle power spec- 
trum. With LOS and flat weightings, the peaks have locations 109.2±13.6 
and 114.1 + 12.6 ft" 1 Mpc. 
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Fig. 13. — Cumulative frequencies of maximum wavelet S/N in Gaussian 
simulations of approximately the volume of the SDSS sample, in various 
cases. For the green curves, simulations were generated using a camb power 
spectrum, and for the black, a no-wiggle power spectrum was used. 

The previous section concerned the expectation value of the 
S/N in the wavelet transform. But if the peak in S/N is broad, 
the maximum S/N measured in an individual realization will 
tend to exceed the maximum in the mean, since nearby coef- 
ficients may fluctuate independently, and each might produce 
the observed peak. 

Fig.[l3]shows the cumulative frequency over 1400 Gaussian 
simulations of the maximum S/N in various scenarios. For 
this plot, peaks in rt, were sought in a narrower range of rb 
than in the previous sections, more indicative of the peaks 
actually observed in the SDSS sample. As previously, peaks 
were excluded if they were along an edge in s or r/,; the actual 
range of peaks allowed was 100 to 120 ft Mpc. The fraction 
of simulations with a true peak (not on an edge of the prior) 
can be read off as the fraction with S/N = 0. 

From this figure, we see that if the underlying power spec- 
trum has a BAO feature, using our method it is not uncom- 
mon to get a peak up to 3 cr from a Gaussian sample with the 
SDSS volume, even along the LOS. Below we estimate the 
S/N of our bump detection at the peak (rb, s) to be 2.2 (LOS 
weighting) and 4.0 (fiat weighting). If the underlying power 
spectrum has a BAO feature, about 40% of the simulations 
had a LOS peak of S/N > 2.0, and about 70% had a peak with 
flat weighting of S/N > 4.0. 

On the other hand, if the underlying power spectrum does 
not have a BAO feature (i.e. in the null hypothesis of BAO 
peak detection), it becomes much more uncommon to get such 
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Fig. 14. — A linear £(r) from camb (with peak at 109.5 h Mpc), along 
with the difference between the camb and the no-wiggle linear correlation 
functions (with peak at 1 10.3 h~ l Mpc). 

peaks. About 11% of no-wiggle simulations had a LOS peak 
of S/N > 2.2, and about 0.2% of no- wiggle simulations had a 
peak with flat weighting of S/N > 4.0. 

3.4.3. Systematic errors 

The Mexican-hat wavelet transform is a natural measure of 
the second derivative at position r^, if the function it is ap- 
plied to is smoothed over a scale s. In fact, arguably a peak 
in the Mexican-hat wavelet coefficient would be a good def- 
inition of the peak position itself, if at an appropriate scale. 
An alternative definition is the actual local maximum of the 
peak in the linear correlation function. However, this defi- 
nition can cause a bias, if the peak is on top of a slope, as 
mildly occurs in the case of a ACDM correlation function, 
shows £camb(/) computed using a camb power spec- 
ong with the difference between that and a no-wiggle 



Fig. [14 
trum, a 



^ now i g (r). Both are measured from density fields with exactly 
the Fourier amplitudes prescribed by their linear power spec- 
tra. The peaks of ^cambM, and of the difference, are at 109.5 
and 1 10.3 h~ l Mpc, a shift of almost 1 h' 1 Mpc. 

The means in the peak distributions shown in Fig s. [4 ] and [5] 
are within ~ 1 h~ x Mpc of both peaks shown in Fig.[l4[ if the 
peak is defined as the actual local maximum of the peak in 
the wavelet estimator tends to overestimate the peak location 
by about 1 h~ l Mpc. 

Again, the small systematic errors found here come from 
purely Gaussian simulations, without nonlinearities or galaxy 
bias, which could generate their own systematic errors, per- 
haps of order the statistical error bars in our sample. 

4. MEASUREMENTS FROM THE MILLENNIUM SIMULATION 

We compare our results from the observations with simula- 
tions of large-scale structure formation. We used Millennium- 
simulation (MS) galaxies to investigate realistic redshift 
distortions of galaxy samples. However, its box size 
(500 hT 1 Mpc), smaller than the size of our sample, and the 
fact that we only have a single realization, make it insufficient 
to derive strong conclusions about the detailed statistics of 
BAO measurements. So, we also employed Gaussian simula- 
tions to get a better idea of the correlations involved, keeping 
in mind that non-linearities would likely degrade significances 
and constraints we measure. 



Figure 15 shows various correlation functions of galaxies 
from the MS, measured on a 256 3 grid. The galaxies used 
were brighter than an r-magnitude of -20 (absolute) as mod- 



0.02 (h~ l Mpc)- 



eled by|De Lucia & Blaizot (2007 1, giving a mean density of 



The curves shown are averages of slice correlation func- 
tions £ over all possible orientations, axes, and slice locations. 
For the real-space £'s, the grid was split into 64 slices of thick- 
ness 7.8125 hT l Mpc along all three Cartesian axes, giving 
3 x 64 £'s to average together. In the middle two columns, 
linear redshift-space distortions were generated analytically, 
using (fi = 0.46, cr = 0) and (fi = 0.46, cr = 3 \T X Mpc) in the 
Kaiser formula 

* 1 + {ko-fj.) 2 

where /u = cos 6. In the right column, redshift-space distor- 
tions were generated using the actual MS galaxy velocities, 
not an approximate model. In each case, the result is an av- 
erage over all possible permutations among the three axes of 
the LOS and the two axes along which the slices were cut. 

Generally, linear-theory redshift-space distortions in 2D 
slices do appear to sharpen the baryon bump in rela- 
tive to real space. Indeed, as we find below with Gaussian 
simulations, linear redshift-space distortions seem not only to 
sharpen the peak, but to increase its robustness too, at least 
when it is analyzed using wavelets. At some level, one ex- 
pects fingers of God (FoG) tend to degrade all features in 
%(n,r p ), even away from the LOS. Interestingly, though, at 
least in the MS, the ID curves do not change much between 
the two bottom-middle plots, suggesting that the fogginess ef- 
fected by FoG may be minor. 

We applied the wavelet analysis to the redshift-space MS 
simulations, as well. Fig. [T6| shows the mean S/N from these 
samples. With flat weighting in angle, there is a 3 ft -1 Mpc 
shift relative to the Gaussian peak location. It would be tempt- 
ing to use this as an estimate of the systematic error from us- 
ing a simulation with full non-linearities and galaxy forma- 
tion, but it could very well be a statistical fluctuation instead. 

Fig. [IT] shows measurements of Go(rt, *)f (fb, s; n, r p ) for 
MS galaxies, and in linear theory. In these figures, the val- 
ues of rf, and s are those at the peaks in the signal-to-noise 
ratio of the wavelet transform {Giifb, 

5. MEASUREMENTS FROM THE SDSS 

5.1. The Sample 

We analyzed the S DSS DR7 ([Abazajian et al.|2009[ ) main- 
galaxy sample (MGS jStrauss et al.|2002| l. These are all galax- 
ies observed by SDSS that have an R-band Petrosian magni- 
tude R< 17.77. Importantly for our analysis, their redshifts 
have been measured spectroscopically. We selected galax- 
ies designated as sciencePrimary, from the Northern cap of 
SDSS, in stripes 9 through 37, with a redshift confidence 
>0.9, and redshift error <0.1. We also cut galaxies from the 
tails of the selection function, including only galaxies with 
distance 100 hr x Mpc < r < 750 ft -1 Mpc. This gave us a to- 
tal of 527,781 galaxies to begin with. Furthermore, there were 
several regions with 'holes' and small, incompletely sampled 
regions in stripes 13, 29, 35 and 36. We removed all objects 
in these incomplete areas. This left us with a total of 527,362 
objects. 

We have computed comoving radial distances r for each 
object, expressed in hr x Mp c, using built-in functio ns in 
the SDSS SkyServer database ( Taghizadeh- Popp|20 1 0) , with 
Q m = 0.279, Qa = 0.721, and wo = — L The angular coor- 
dinates were converted to Cartesian, and rotated to a coordi- 
nate system, whose z-axis was along the SDSS North Pole, 
at ra=185, dec=32.5. Each of the normal vectors were then 
multiplied with the computed radial distance to give us a 3D 
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Fig. 15. — Correlation functions of a sample of galaxies modeled within the Millennium simulation (MS). For reference, the 3D real-space correlation function 
is shown in dotted black in all lower panels. The colored solid curves in the bottom panels show the angular dependence of i;(n, r p ), shown with its full 2D 
structure in the top panels. The 2 angular bins include angles within 6° of the LOS (red) and of the direction perpendicular to it (green). 
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Fig. 16. — The signal-to-noise ratio of the wavelet transform 
(G2(rb, s)£(n, r p )} in the MS simulation. In the MS, the variances are mea- 
sured among slices. The left column uses a flat weighting in 9, while the right 
column includes only 8 < 6° . With fiat and LOS weightings, the peaks are at 
(r b ,s) = (110,20) and (113, 21), respectively. 

location. 

We computed a smooth polynomial fit to the redshift dis- 
tribution, dn(r)/dr, using a 6th order polynomial in r. The 
curve is shown in Fig. T8] We used the angular selection 
mask, defined by boundaries of the selected stripes and the 
censored 'holes' and this analytic radial distribution to gener- 
ate 17M random galaxies with the correct geometric proper- 
ties. Each random galaxy had an additional random number 
precomputed, making it easy to select decimated subsets for 
further analyses. 

These two data sets were then stored in two database tables. 
We created a simple function that was able to take an arbitrary 
rotation of the samples around the z-axis. We stepped the ro- 
tation angle in 15° increments, from 0° to 165°, for a total 
of 12 angular orientations. For each orientation we extract 
2.5° thick slices, resembling the original SDSS stripes, except 
for the rotation. We eliminated slices which contained only a 



small fraction of the data, located at the edges of the survey, 
i.e. slices of width < 20°. Slices that were wider than 80° 
were split in half across, so that no slice exceeded the width 
of 80°. This gave us 661 slices. 

5.2. Computing the Correlations 
We used the Landy & Szalay ( 1993] ) est imator to esti- 



mate the correlations, for its optimal behavior (Kerscher et al. 
2000): 

DD - 2DR + RR 

&, = ^ . 07) 

where DD, DR and RR are numbers of pairs of random and 
actual (data) galaxies, in length bins of 

Measuring the distances among all random and data galax- 
ies is inherently an N 2 problem. The current state-of-the-art 
solution by [Moore et al. ( 200 1 )l involves binary trees built on 



the datasets, and uses a dual-tree traversal algorithm. The idea 
is to speed up the procedure by checking distance constraints 
on pairs of tree-nodes that represent 3-D boxes. If all pairs 
of points coming from the cells fit in a single bin, one can 
increment the counts and stop going deeper on that branch. 
For low-resolution 1-D statistics, such as the angular corre- 
lation functions measured in a dozen logarithmic bins, the 
above procedure can indeed increase the performance tremen- 
dously especially when one is interested in small-scale clus- 
tering and able to discard early all pairs of large separations. 
Our problem is more difficult: here we measure the the two- 
dimensional redshift-space n-r p correlation function out to the 
largest scales at high resolution in 800 x 800 = 640 000 bins 
of spacing 0.5 Mpc. (To eliminate null pixels in the region of 
interest, however, we ended up degrading the resolution to 2 
Mpc.) The dual-tree code slows down in this high-bin limit 
and becomes essentially as expensive as the brute-force naive 
method. 
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Fig. 17. — The top panel shows the linear-theory redshift- space correla- 
tion function, flattened at (r%,s) = (1 10, 10) h Mpc, as m easured from 
2D slices from Gaussian simulations discussed in Sect. 13.31 The bottom 
panel shows the MS galaxy redshift-space correlation function flattened at 
(r/,, s) = (113, 21) Mpc. They are flattened at the same r;,, s as their re- 
spective peaks in wavelet S/N, measured with flat weighting. For comparison 
with other figures, they show sinh~'(300f) instead of f, although here £ is 
small, so the sinh~' transform hardly changes the plot's appearance. 




Fig. 18. — The radial distribution function dn(r)/dr of the MGS sample 
we used, normalized to J n(r)dr = 1 (blue), and a polynomial fit (red). The 
polynomial fit was used to generate our random samples. We also show the 
near and far cuts for the full sample, and the near cut for the high-z sample. 
The large upward fluctuation at about 220 h Mpc is produced by the Sloan 
Great Wall. 
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Fig. 19. — The correlation function r„) measured from the SDSS main- 
galaxy sample. The left panel is measured from the 100 - 750 h Mpc full 
sample, and the right panel is from the 300-750 Mpc high-z sample. The 
high-z sample is only a bit smaller (losing 6% by volume), but it excludes the 
Sloan Great Wall, a structure whose inclusion is known to alter clustering 
statistics substantially. The high-z sample appears grainier than the full sam- 
ple because of increased shot noise. 

Our solution is to implement the counting on modern graph- 
ics processing units (GPUs) that offer hundreds of cores and 
run tens of thousands of threads simultaneously on commer- 
cial video cards. We use NVIDIA's Compute Unified Device 
Architecture (CUDA) to implement the parallel correlation 
function code in the C++ programming language and inte- 
grate it with SQL Server where the data reside. Using SQL 
wrapper routines, we run the analysis directly on GTX 295 
GPUs without temporary intermediate file storage. The per- 
formance is hundreds of times faster on this parallel architec- 
ture when compared to todays CPUs, which is not too surpris- 
ing for the large number of algorithm logic units (480 ALUs) 
on these cards. The results from the GPUs are returned in 
database tables, stored and further analyzed in SQL to com- 
pute the final correlation functions. 

The 2D correlation function was computed for each slice, 
with a2/i _l Mpc resolution, out to 570 hr l Mpc in each di- 
rection. A total of 400 trillion galaxy pairs were computed, 
including both real and random points. 



5.3. Results from the SDSS 



Fig. 



19 shows r p ) from two samples: first, the full sam- 
ple, including all galaxies from 100-750 hr l Mpc, and a high- 
redshift sample, 300 - 750 h~ l Mpc. Along the LOS, the full 
sample has two prominent peaks: one at about 97 hr l Mpc, 
and a second at about 170 hT 1 Mpc. The first could be associ- 
ated with a BAO feature, but the second could not, given plau- 
sible priors on the BAO scale from previous measurements 
(e.g. Eisenstei n et al.|2005) . 

We analyzed the higher-redshift sample because the mea- 
surement gives perhaps undue weight to the densely sampled 
structures at low redshift, such as the Sloan Great Wall. In- 
cluding the S loan Great Wall, which is at a distance of about 
220 h Mpc ( Gott et al.|2005), is known to ch ange clustering 
statistics substantially (e.g. Nichol et al. 2006), producing an 
upward fluctuation in Fig. 18 of 20%. 

A better solution might be to reduce the additional cosmic 
variance from such a structure by d own-weighting it, e.g. with 
a Gaussianizing density mapping ( |Neyrinck e t al. 2009 1, but 
for the present paper we simply cut the near part of the sam- 
ple, within 300 ft -1 Mpc, and apply an additional weighting 
to each galaxy. By volume, this high-z sample is only 6% 
smaller than the the full one, although it only contains about 
half the galaxies, 261,737. The effective volume including 
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Fig. 20. — The spatial weighting function used in our measurement, derived 
from the minimum variance estimator. 



shot noise (Teg mark|1997 1 



VeffGfc) 



dn(r) 
dr 



P(k) 



(38) 



for the full sample is 0.29 (h 1 Gpc) 3 . For the high-z sam- 
ple, V eff = 0.27 {\T X Gpc) 3 , a 9% difference. Here we use 
P(k = 0.1/zMpc- 1 ) = 7000 (r 1 Mpc) 3 (from the linear 
power spectrum used in the Gaussian simulations, at about the 
BAO scale in wavenumber), assuming a 7r-steradian survey. 

The optimum (minimum variance) spatial weighting for 
galaxies at radial distance r, on the clustering scale L is given 



by ( |Kaiser|1986l > 



w(r, L) = 



1 



1 + 4jrn(r)J 3 (L) 



(39) 



For the SDSS main-galaxy sample, / 3 (110) ~ 30,000. Us- 
ing the selection function of the galaxies we can compute the 
weight corresponding to the 1 10 hr x Mpc scale. We then fit a 
third-order polynomial to the log of the weight function, and 
use this analytic expression in the further analysis. 

Fig. [20] shows the spatial weighting function. Due to the 
large change in the weight function from the near to far edge, 
we experimented with different choices for the weighting, ap- 
plied to each galaxy: (a) uniform weighting, (b) w 1 ^ 2 , (c) 
w. We found that (c) results in a substantially increased shot 
noise, since there is too much weight added to the small num- 
ber of objects at the far edge of the volume, while there is very 
little difference between (a) and (b). As a result, we adopt (b), 
the square-root-weighted high-z sample, as our fiducial one 
for analysis. Indeed, cutting galaxies closer than 300 ft -1 Mpc 
does entirely remove the 170 h Mpc feature. 

Figure 21 shows flat angle-averaged, and LOS, correlation 
functions for MS and Gaussian samples, and for the high-z 
SDSS sample. No galaxy-bias factor was applied to the Gaus- 
sian curves. 

In each case, £ is measured among 2D slices, and then aver- 
aged together. The error bars are rather different in size; this 
is because of the different sample volumes. Also, as usual in 
discussions of the correlation function, we should note that 
the error bars are somewhat correlated. 

The Gaussian error bars are the simulation-to-simulation 
standard deviations of measured in 2D slices (using only 
one slicing orientation) and then averaged together. In the 
MS and SDSS cases, the error bars are estimated from within 
the sample of slice correlation functions, requiring an esti- 
mate of the number of degrees of freedom (DOF) among the 
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Fig. 21. — Angle-averaged correlation functions for Gaussian simulations, 
galaxies in the MS, and from the high-z SDSS sample. The top panel shows 
the if averaged uniformly over angle, and the bottom uses only data within 6° 
of the LOS. The error bars are discussed in the text. 

correlated slices. The square root of this DOF is the factor by 
which we divide the slice-by-slice dispersion to get the plotted 
error bars. 

In the MS case, the flat-angular-weighting DOF is the num- 
ber of slices (128 = 2 axes x 64 slices per axis) divided by 1.2, 
the same factor used below in Section 15.4. II from the Gaus- 
sian simulations to account for the additional cosmic variance 
from going from ensembles of slices to ensembles of simula- 
tions. The DOF in the LOS case gets divided by an additional, 
conservative factor of two because many LOS pairs of galax- 
ies are present in slices along both slicing directions. We were 
particularly careful for the SDSS sample in estimating the re- 
duction in DOF from correlated slicing orientations; see Sect. 
|5.4| below for details. 

The flat-angular-weighting £'s in the SDSS and MS sam- 
ples are strikingly similar in shape. In fact, we do not nec- 
essarily expect the samples to match, since the galaxy prop- 
erties or number densities do not match between the MS and 
SDSS samples. We consider the high degree of agreement to 
be largely by chance. 
One feature that seems quite solid, though, is a LOS trough 

below), that is much 
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at n x 55 hr x Mpc (see also Fig. 
deeper than the Gaussian simulations. Multiplying the Gaus- 
sian curve by some linear bias factor could perhaps make 
these troughs line up better, but this would cause disagreement 
at larger r, and would require quite a large bias factor, which 
we do not expect for this relatively low-luminosity sample. 
The depth of the trough is a sign that non-linear infall and/or 
galaxy bias clear out this region along the LOS much more 
dramatically than in linear theory. We speculate that in the 
SDSS case, one reason for the strong peak along the LOS is a 
pile-up of these cleared-out galaxies. 

5.4. Effective degrees of freedom 
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Fig. 22. — Top row: for flat and LOS weightings, the mean overestimation 
factors of the S/N levels from estimating the S/N within a sample of correlated 
slices, i.e. ((S/N) s i; ccs )/(S/N) s i; c j llgs . Outside a "peak region" (with a mean 
S/N at least half of the maximum S/N), the plotted value is set to zero. Bottom 
row: histograms of the overestimation factor, drawing from all simulations 
and from all (ri,, s) within the peak region. 

While our slicing strategy enables an estimate of error bars 
and signal significance, this estimate is rather complicated, 
since slices overlap, and even when they do not, nearby slices 
are likely correlated with each other. If all slices were sta- 
tistically independent, the degrees of freedom (DOF) would 
be the number of slices, 661. We investigate two factors by 
which this number must be reduced: first, a factor coming 
from slice correlations and cosmic variance; second, a factor 
from overlap due to the 12 different angular slicings. 

5.4.1. Correlated slices and cosmic variance 

We typically expect an overestimate in the significance 
level of a detection when estimating it from within the sample. 
We investigate this effect on Gaussian simulations to which 
linear redshift distortions have been applied. 

As in Section |3.3| we sliced Gaussian simulations (box 
size 768 h Mpc, cell size 2 h~ l Mpc) into 16 ft -1 Mpc slices. 
Here, we take just one slicing per simulation. From each 
simulation, we estimate (S/N) s ii ces of the wavelet coefficient 
at each (r^,, s) by measuring the mean and standard deviation 
among slices within the simulation. In this case, we multiply 
the S/N by V^siices* the DOF of the single slicing in the ap- 
proximation that all slices are independent. We also measure 
(S/N) s iicings more properly, measuring the mean and standard 
deviation of averages of slicings of different simulations. The 
latter estimate, which includes the effects of slice-to-slice cor- 
relation and cosmic variance, is the same as those performed 
in Section |3.3| except that here, one (instead of two) slicing 
orientation is used. 

Fig. 22 shows the means and distributions of the ra- 
tio of (S/N) slices, which is different for each simulation, to 
(S/N) s ii c i ngs . We expect it to exceed one on average, since esti- 
mating the significance among slices should give an overesti- 
mate. In the top row, the dark regions are within the "peak re- 
gion," defined as having a (S/N) s ij c j ngs of at least half the max- 
imum (S/N) s ii c i ngs . We focused on this peak region to avoid 
dividing by small numbers when taking the ratio. The bottom 
row shows histograms of this ratio over all simulations, and 
over all (r/,, s) in the peak region. Results are shown for both 
'flat' and LOS angular weighting. 

Although we have used simulations with about the same 
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Fig. 23 . — Dashed, the ratio between the number of data-data pairs measured 
in 2D (slicing and projecting), and in 3D, without any slicing. Because of the 
projection from 3D onto the plane of the slice, this ratio exceeds 12 (the 
number of slicings) at small separation. For the calculation of the DOF, we 
truncate this function at 12 (solid). 

volume as the SDSS sample, again they are purely Gaus- 
sian simulations, with linear redshift distortions. The slice- 
to-slice variance in the wavelet coefficient is surely underes- 
timated using them. However, in Gaussian simulations, the 
simulation-to-simulation variance is likely underestimated as 
well, so we expect their ratio (which is what actually enters 
our SDSS analysis) not to be underestimated as severely as 
the slice-to-slice variance in the wavelet coefficient. 

The mean of this ratio of standard deviations in the peak 
region is only a bit over 1; the mean in both LOS and flat 
weightings is 1.1. Thus we adopt 1.1 2 = 1.21 as the DOF 
reduction factor from going from slice-to-slice variance to 
simulation-to-simulation variance. We should note that there 
is a large dispersion in this factor, which simply means that 
a given simulation can have a much larger or smaller signal 
than the mean would give. 

5.4.2. Correlated slicing orientations 

The factor by which we would overestimate the DOF by as- 
suming all slicings to be independent is N OI i ent /N e ^„„ where 



eft' 



orient' 

is the number 



-^orient is the number of slicings, and iV 
of slicings including degeneracy. Roughly, N 0liem /Nl^ t = 
^occupied, the mean number of slices occupied by a pair of 
galaxies. We estimate /V occup j e( i in two ways: first, by counting 
galaxy pairs that go into the 2D and 3D correlation functions. 
Second, we relate the problem to that of Buffon's needle (Buf- 

MT7771 . 

To estimate /V , in a brute-force fashion, we add up the 

orient r 

raw number of galaxy pairs (DD), counted in all 661 2D 
slices, as a function of (r, 6) in the (n, r p ) plane. We also mea- 
sure the pairs in 3D, and compare them. Using a range in r of 



100-120 h Mpc, we count the pairs in l°-wide bins. Fig. 123 
shows the ratio of the 2D to 3D pairs. In the limit of small sky 
separation, we expect all pairs to enter 12 slices. The reason 
that the plotted ratio exceeds 12 at small separation is because 
of the slices' 2D projection, important for 9 < 2.5° (the slice 
width): a pair's 3D 9^ y gets projected onto the plane of the 
slice, resulting in a smaller 2D # S ky- Also note that at large 
6, the fraction dips below 1. This means that the 12 slicings 
were not enough to catch all pairs at large angles. 

To estimate t for the two weightings, we average the 
plotted ratio from to 6° (LOS), and from to 90° (flat). 
Since pairs cannot inhabit over 12 slices, for this average we 
truncate the curve at 12. Averaging this truncated curve gives 
^occupied = 9.2 (LOS) and /V occupied = 1.7 (flat). 

Another, analytical way of estimating the the number of 
slices on average that a pair of galaxies inhabits is related to 
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the problem of Buffon's needle, which can be stated as fol- 
lows: Let PBuffonWO denote the probability that a needle of 
length I, dropped randomly on a piece of paper with paral- 
lel lines separated by a distance t, will cross at least one line. 
Buffon found that 

fW on (* = M) = -X, x < 1. (40) 

7T 

The expression is more complicated if x > 1, but we do not 
use that case here. 

Associating galaxy pairs with "needles," and slice edges 
with the parallel lines on the paper, a galaxy pair will inhabit 
a slice if it does not cross any lines. Imagining a slice ori- 
entation as a random rotation and translation of the lined pa- 
per, the fraction of slice orientations for which a galaxy pair 
(of separation 6$^ on the sky) appears in a slice (of angu- 
lar thickness t) will be 1 - PBuffon(#sky/0- With 12 orienta- 
tions, the average number of slices occupied per galaxy will 

be A^ occup ied(0sky) = 12[1 -^BuffonWiky/O]' 

If the angle in the t;{n, r p ) plot were the angle on the sky, 
we would simply average Af ccupied(#sky) over a range of f? s k y 
to get the average LOS galaxy-pair-slice occupancy. We work 
in (n, r p ) coordinates though, so we estimate a maximum 6> s k y 
around the BAO scale along the LOS to be about 1/3.5 of 
the maximum angle in (7T,r p ). This 1/3.5 factor is a typical 
ratio of their LOS separation (~ 100 h' 1 Mpc) to the LOS 
distance to the observer (~ 350 hr x Mpc). Averaging 12[1 - 
■PBuffon(#sky/0] over angles from to 6/3.5 gives A^ occup i e d = 
9.4 slices, in accord with the previous estimate based on pair 
counts. 

To illustrate the degree of independence of ^ among the 12 
slice orientations, we plot £ for each orientation, and also its 
mean. Fig. [25] shows this for LOS and flat weightings. As 
expected from the arguments in this Section, there is much 
more variance with flat weighting than LOS, since the galaxy 
pairs along the LOS occupy slices of many slice orientations. 

6. WAVELET ANALYSIS OF THE SDSS RESULTS 

We apply the same wavelet-based peak finder described in 
the previous sections to the high-redshift SDSS results. Fig. 
24 shows the results. The top panels shows the S/N from this 
sample, with LOS and flat angular weighting, over the same 
range (i.e. Bayesian prior) in (ft,, s) as in the Gaussian simula- 
tions. Here the raw inter-slice S/N is divided by the effective 
num ber of degrees of freedom (DOF), discussed in Section 

5.4 If the slices were independent,the DOF would simply be 
the number of slices, 661. But this needs to be reduced by a 
factor from slice correlation and cosmic variance (which we 
estimate to be 1.2), and a factor from physical slice overlaps 
from the multiple slicing orientations (which we estimate to 
be 9.2 for LOS weighting, and 1.7 for flat weighting). With 
this correction, we estimate the peak S/N of the LOS peak to 
be significant at the 2.2-cr level, and the peak with flat angular 
weighting at the 4.0-cr level. As in the previous result plots, 
we weight each slice by its width. 

The bottom panels show 2D histograms of peak locations 
from 3000 bootstrap-resamplings of 661 slices (with replace- 
ment) from the 661 slices. Because we use the same number 
of slices in the bootstrap samples as in the measurement, this 
gives an estimate of the actual uncertainty in the peak posi- 
tions, estimated from within the sample itself. 

The peak locations, with error bars thus estimated from 
within the sample, are 109.7 ± 4.7 (LOS), and 113.6 ± 

3.5 hr x Mpc. These error bars consider some cosmic vari- 
ance, but only that found within the sample. To this, we 
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Fig. 24. — Top row: signal-to-noise plots showing the estimated signifi- 
cances of bumps in the high-redshift SDSS sample, as a function of (rj, s). 
Bottom row: 2D histograms of the locations of S/N peaks, among 3000 
bootstrap-resamplings. 



add in quadrature the error bars from cosmic variance outside 
the sample, estimated from the Gaussian simulations (which 
give error bars of 8 h Mpc in LOS, and 2 h Mpc with 
flat weighting). This gives estimates of 109.7 ± 9.3 h Mpc 
(LOS) and 113.6 + 4.0 h~ l Mpc (flat). These do not include 
systematic error bars, which above we estimated to be about 
1 hT l Mpc for the Gaussian simulations, but would likely be 
larger for fully non-linear simulations. 

The S/N peaks in both the MS and SDSS samples are at sub- 
stantially higher s than in the Gaussian simulations. One pos- 
sible cause for this is the broadening in the BAO feature that 
Lagrangian displacements of order 10 h~ l Mpc from large- 
scale tidal motions produce (e.g. |Eisenstein et al. 2007| l. Also, 
fingers of God likely broaden the peak in the LOS direction. 

Of course, our Gaussian simulations we use here are ide- 
alized, missing several complications that are mild but per- 
haps not entirely negligible on BAO scales: non-linearities 
in matter clustering; shot noise; effects from the non-cubic, 
non-periodic survey shape; bias between the galaxy and mat- 
ter fields; non-linear redshift distortions, especially fingers of 
God; and perhaps large non-Gaussian (co)variance ( Rimes| 
|& Ham ilton 2005), which however seems a bit milder for 



redshift-space galaxies than for real-space matter (Neyrinck 
|et al.|2010) . 

To get an iron-clad estimate of the significance, we need 
high-resolution Gpc-scale simulations, that would both en- 
compass a representative volume, and possess full non- 
linearities. This will be part of a follow-up analysis, where 
we will build a few hundred rather high-resolution simula- 
tions, analyzed using realistic, SDSS-like survey geometries. 

In the mean time, because these constraints will likely en- 
large in the face of non-linearities, we conservatively adopt 
the looser constraint on the BAO peak position, that comes 
from the LOS weighting, rounding up the error bar to 110 + 
10 h~ x Mpc. This error bar is likely larger than any additional 
systematic errors from non-linearities. Also, it would accom- 
modate the peak in the full (including the Sloan Great Wall) 
sample, which is at 97 h Mpc. 

While 4.0 is a measure of the S/N of a bump in the SDSS 
sample at the particular peak (r/,, s) in the S/N plot, the per- 
centages given at the end of Section 3.4.2 give another mea- 
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Fig. 25. — Top: sinh~' (300f(;r, r p )) using flat angular weighting in the 
(n,r p ) plane, for all 12 slicing orientations, and their mean (black, bold). 
Bottom: same, for LOS angular weighting. 



sure of the significance of our peak detection, that is immune 
to a posteriori bias issues. 0.2% of no-wiggle simulations 
have any bump of S/N > 4.0 that could plausibly be asso- 
ciated with the BAO feature (i.e. that is between 100 and 
120 h Mpc. This implies a 99.8% chance that a bump such 
as we see (i.e. anywhere in the vicinity of the expected BAO 
location) is truly a BAO feature. 

In another test, that does not depend at all on Gaussian sim- 
ulations, we use the variances among the 12 slicings of the 
wavelet transforms of correlation functions, averaged over all 
slices within the slicing. Fig. 25 shows these correlation func- 



tions, along with their mean. As expected, the LOS dispersion 
is smaller than with flat weighting, since the LOS measure- 
ments are highly correlated from slicing to slicing. In con- 
trast, the predicted simulation-to-simulation variance in the 
LOS is greater than with flat weighting; this is why our stated 
S/N level is lower with LOS than with flat weighting. 

Fig. 26 shows the wavelet transform's S/N as a function of 
rt,, holding the wavelet scale fixed at s — 20 h Mpc, us- 
ing the raw, slicing-to-slicing dispersion for the noise N. This 
gives a LOS peak S/N of 7.9, and a flat-weighting S/N of 
1.9. This is without reducing the noise (by the square root 
of the effective DOF) because we are averaging the results 
of different slicings together. The effective DOF is the effec- 
tive number of independent slicings; These are 12/9.2 = 1.3 
(LOS), and 12/1.7 = 7 (flat), bringing the estimated S/N to 
9.0 (LOS) and 5.2 (flat). The LOS significance, in particu- 
lar, seems ridiculously large, but we must remember that it 
is based on only 1.3 effective measurements; including the 
error bar (fractionally, 1/ VDOF), the S/N is 9.0 ± 7.9. The 
slicing-to-slicing estimate of the flat S/N is more meaning- 
ful, 5.2 ± 2. We should note that this estimate does neglect 
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Fig. 26. — The S/N of the wavelet transform as a function of wavelet peak lo- 
cation rt, using the slicing-to-slicing dispersion for the noise N. The wavelet 
scale is fixed at s = 20 h Mpc, where the peak over all (rt, s) is found. 
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Fig. 27. — fsDSs(X r p ) flattened, and attenuated with a radial Gaussian 
window, at (rt = 1 10, .v = 20). This visual transformation emphasizes the 
information to which the Mexican Hat wavelet is sensitive. 

slicing-to-slicing correlation that goes beyond actual overlaps 
of galaxy pairs, but we expect this additional correlation to be 
small. 

Fig. [26] also shows the tremendous significance of the 
trough at r b st 55 ft -1 Mpc. We speculate that this trough is 
cleared out to a greater degree in the presence of a BAO peak, 
and this is an interesting issue for further study. 

Fig. 27 shows £ from the SDSS sample, flattened in the 



manner of Sec. 3.2| with parameters at the flat-weighting peak 
(r b ,s) in Fig. [24 



7. CONCLUSION 

In this paper we have shown how redshift space distortions 
can amplify and modify features in the galaxy correlation 
function. In linear theory, the BAO feature sharpens as the 
angle approaches the line of sight. While we expect nonlin- 
ear effects such as fingers of God to broaden the BAO feature 
along the line of sight (LOS), they do not seem to dampen it 
substantially. 
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We see a compelling LOS "bump" in the SDSS DR7 MGS 
correlation function, measured in 2D slices of varying pitch 
angle that cover the sample. We analyze the bump using a 
Mexican-hat wavelet transform, which has a peak in signal- 
to-noise ratio S/N at (110 + 10) h~ l Mpc, marginalized over 
the scale of the wavelet. We assess the significance of the 
bump by looking at variances of slice correlation functions 
within the sample, which presents difficulties because of slice- 
to-slice correlations. Nevertheless, at the particular location 
and wavelet scale of the peak in wavelet S/N, we tentatively 
assess the significance of this peak at 4cr if the correlation 
function is averaged in a flat angular weighting, and at 2.2a if 
it is averaged only within 6° of the line of sight. 

These significance estimates employ Gaussian simulations, 
and thus are likely a bit optimistic. However, this optimism 
is more modest than it might at first seem; we only use the 
Gaussian simulations to extrapolate the significance estimate 
measured from slices within the sample to the proper, sample- 
to-sample significance estimate. We make another estimate 
that does not use simulations at all. From just the variance of 
the wavelet transform among angular slicing orientations, we 
estimate that the S/N = 9 ± 8 (LOS) and S/N = 5 ± 2 (flat 
weighting). 

Taking into account a posteriori bias, we also ask how 
common a bump of the measured significance is in Gaus- 
sian no-wiggle simulations, not just at the peak of the wavelet 
S/N, but anywhere in the vicinity of the BAO feature (within 
10/i _l Mpc of 1 10 hT x Mpc). Evaluating the wavelet using 
flat angular weighting, we find that such a bump is only 
present in 0.2% of no-wiggle simulations. 

It is curious that the LOS BAO feature seems to be stronger 
than apparently expected from simulations, in both the Sloan 
LRG (Gaztanaga et al. 2009) and main-galaxy samples. Its 
presence in both, rather independent, samples suggests that 
this is not a fluke. 

We propose that the sin 9 weighting that is commonly used, 
while sensibly motivated since it follows the distribution of 
galaxy pairs in a 3D sample, is likely suboptimal in constrain- 
ing power, since it greatly suppresses any LOS signal. An 
example of an alternative weighting that is less hostile to the 
LOS is the flat weighting we employ, which is natural for the 
2D slicing strategy we use. 

While our current analysis makes the case that the features 
we see are rather statistically significant, a conclusive de- 
termination awaits a planned study employing more-realistic 
simulations, that include nonlinearities, galaxy bias, survey- 
shape effects, etc. This study will not only address our present 
result specifically, but we hope will provide a definitive an- 
swer the question of how BAO constraining power varies with 
angle from the LOS. 

Our claimed rather high significance level may be surpris- 
ing in light of recent analyses casting doubt on the reality of 



bump at a certain wavelet scale and location if the correlation 
function has a positive wavelet coefficient there, i.e. that the 
second derivative of the correlation function, smoothed over 
the wavelet scale, is negative. 

Even if we have overestimated the significance of our de- 
tection, an important point is that it still may be used for cos- 
mological constraints. As Cabre & Gaztanaga (2010) empha- 
size, the hypothesis test that a BAO feature exists in a given 
sample is a separate statistical question from the degree of its 
cosmology-constraining power. There is essentially no doubt 
that the power spectrum underlying the structure in our Uni- 
verse had a BAO feature at the epoch when the cosm ic mi- 
crowave background was emitted (e.g. |Larson et al. 



2010), 



BAO detec t ions even in the larger, S DSS LRG sample ( Kazin 
et al.|2010 Cabre & Gaztanaga 2010). In these analyses, if a 
correlation function is better-fit by a "no-wiggle" model, there 
is deemed not to be a bump. Although we do not quantita- 
tively compare the methods, our peak detector is likely more 
tolerant of modest bumps. For us, essentially there exists a 



and if it is not present at the present epoch, that would be a 
big cosmological puzzle. 

Under the assumption that the power spectrum underlying 
the structure in our Universe has a BAO feature, we find that 
the appearance of even a low-significance bump gives some 
constraining power over the BAO feature's location. For ex- 
ample, in a Gaussian sample with the volume of ours, if a 
clear bump exists in a sample (which happens about half the 
time), the bump gives an error bar in the BAO peak location 
of ~ 8 A" 1 Mpc. 
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